video dataset
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
- Oceania > Australia (0.04)
- Asia > Middle East > Iran (0.04)
- Asia > China > Heilongjiang Province > Harbin (0.04)
- Instructional Material (0.69)
- Research Report > New Finding (0.46)
- Education (1.00)
- Health & Medicine > Diagnostic Medicine (0.68)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.68)
- (3 more...)
- Asia (0.05)
- Africa (0.05)
- North America > United States > Indiana (0.04)
- (5 more...)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Asia > China > Hong Kong (0.04)
VideoMAE: MaskedAutoencodersareData-Efficient LearnersforSelf-SupervisedVideoPre-Training
Transformer [70]has brought significant progress in natural language processing [17,7,54]. The vision transformer [20] also improves a series of computer vision tasks including image classification [66,88], object detection [8,37], semantic segmentation [80], object tracking [13,16], and video recognition [6,3].
supervision
A large part of the current success of deep learning lies in the effectiveness of data - more precisely: labelled data. Yet, labelling a dataset with human annotation continues to carry high costs, especially for videos. While in the image domain, recent methods have allowed to generate meaningful (pseudo-) labels for unlabelled datasets without supervision, this development is missing for the video domain where learning feature representations is the current focus.